GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array

نویسندگان

  • Shuji Suzuki
  • Masanori Kakuta
  • Takashi Ishida
  • Yutaka Akiyama
چکیده

DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.

منابع مشابه

Improved Processing of Path Query on RDF Data Using Suffix Array

RDF is a recommended standard to describe additional semantic information to resources on the Semantic Web. Matono et al. proposed an indexing and query processing scheme for path-based RDF query using a suffix array. In this paper, we indicate some points on the previous approach. We propose an improved indexing and query processing scheme to reduce the binary search space and the overhead cau...

متن کامل

Pairwise sequence alignment using bio-database compression by improved fine tuned enhanced suffix array

Sequence alignment is a bioinformatics application that determines the degree of similarity between nucleotide sequences which is assumed to have same ancestral relationships. This sequence alignment method reads query sequence from the user and makes an alignment against large and genomic sequence data sets and locate targets that are similar to an input query sequence. Existing accurate algor...

متن کامل

Bio-database compression using enhanced suffix array for pairwise sequence alignment

Sequence alignment is a bioinformatics application that determines the degree of similarity between nucleotide or amino acid sequences which is assumed to have same ancestral relationships. This sequence alignment method reads query sequence from the user and makes an alignment against large and genomic sequence data sets and locate targets that are similar to an input query sequence. Tradition...

متن کامل

Ultra-fast Multiple Genome Sequence Matching Using GPU

In this paper, a contrastive evaluation of massively parallel implementations of suffix tree and suffix array to accelerate genome sequence matching are proposed based on Intel Core i7 3770K quad-core and NVIDIA GeForce GTX680 GPU. Besides suffix array only held approximately 20%∼30% of the space relative to suffix tree, the coalesced binary search and tile optimization make suffix array clearl...

متن کامل

Efficient de novo assembly of large genomes using compressed data structures - Supplemental Materials and Methods

The suffix array is a compact representation of the lexicographic ordering of the suffixes of a text [1]. Each element of the array is an index into the original string; SAX [i] = j indicates that the suffix starting at position j in T is the i-th lowest suffix in X. As an example consider the string T = AGATCGATA$. The suffix array of T is SAT = [10, 9, 1, 7, 3, 5, 6, 2, 8, 4]. As the suffix a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014